Goal 3: US COVID Data

Data Prep

In order to prep our data from our EDA we will load, transform the date, remove all zero row from the variable we are testing hospitalizedCurrently, and sort from oldest to newest. This will allow us to easily work through our multivariate time series evaluations.

us <- read.csv("https://raw.githubusercontent.com/JaclynCoate/6373_Time_Series/master/TermProject/Data/USdaily.7.26.20.csv", header = T, strip.white = T)
us <- transform(us, date = as.Date(as.character(date), "%Y%m%d"))
us <- subset(us, select = -c(states, dateChecked, hospitalized, lastModified, total, posNeg, totalTestResultsIncrease, hash))
us[is.na(us)] <- 0
us = us[order(as.Date(us$date, format = "%Y%m%d")),]
head(us)
##           date positive negative pending hospitalizedCurrently
## 187 2020-01-22        2        0       0                     0
## 186 2020-01-23        2        0       0                     0
## 185 2020-01-24        2        0       0                     0
## 184 2020-01-25        2        0       0                     0
## 183 2020-01-26        2        0       0                     0
## 182 2020-01-27        2        0       0                     0
##     hospitalizedCumulative inIcuCurrently inIcuCumulative
## 187                      0              0               0
## 186                      0              0               0
## 185                      0              0               0
## 184                      0              0               0
## 183                      0              0               0
## 182                      0              0               0
##     onVentilatorCurrently onVentilatorCumulative recovered death
## 187                     0                      0         0     0
## 186                     0                      0         0     0
## 185                     0                      0         0     0
## 184                     0                      0         0     0
## 183                     0                      0         0     0
## 182                     0                      0         0     0
##     totalTestResults deathIncrease hospitalizedIncrease negativeIncrease
## 187                2             0                    0                0
## 186                2             0                    0                0
## 185                2             0                    0                0
## 184                2             0                    0                0
## 183                2             0                    0                0
## 182                2             0                    0                0
##     positiveIncrease
## 187                0
## 186                0
## 185                0
## 184                0
## 183                0
## 182                0

Stationarity: Current Hospitalizations

It is difficult to assume stationarity for this data due to multiple factors. We are working under the assumption that COVID is a novel virus and cases as well as hospitalizations will eventually return to zero. This being said our current modeling techniques do things such as return to the median or mimic the previously seen trends. Also, we see a heavy wandering trend in both new cases and hospitalization would be dependent on this as well as time. We will review the data and see what, if any, non-stationary components reveal themselves and model the data accordingly.

Current Hospitalization Realization

Traits:

  • Heavy wandering behavior
  • What appears to be some noise that could be pseudo-cyclic behavior hidden by the large numbers.
ggplot(data = us, aes(x=date, y=hospitalizedCurrently))+
  geom_line(color="orange")+
  labs(title = "Current COVID Hospitalized Cases US", y = "Thousands", x = "") +
    theme_fivethirtyeight()

Independent Variable Review

When reviewing the variables and the scatter plot from our previous EDA we can see that there are some correlations that were expected, for example currentHospitalizations is correlated with the variables that reflect ICU and Ventilator patients. These metrics wouldn’t exist without hospitalizations. These variables also cannot help up predict hospitalizations because they after occurrences. So, we will actually be leveraging variables such as positive increase in order to see if there is some sort of correlation between hospitalized patients and the number of positive cases.

  • Positive Increase Trend
ggplot(data = us, aes(x=date, y=positiveIncrease))+
  geom_line(color="orange")+
  labs(title = "Increase in COVID Cases US", y = "Thousands", x = "") +
    theme_fivethirtyeight()

Multivariate MLR w/ Correlated Errors for Currently Hospitalized Patients

Forecast Independent Variables: Increase Positive Cases

  1. Evaluation: Slowly dampening ACF and heavy wandering consistent with a (1-B) factor. As well as frequency peaks at 0 and .14. Consistent with (1-B) and seasonality component of 7.
#a
plotts.sample.wge(us$positiveIncrease)

## $autplt
##  [1] 1.0000000 0.9740745 0.9439116 0.9109437 0.8869924 0.8711719 0.8605261
##  [8] 0.8462935 0.8144161 0.7772852 0.7356766 0.7045767 0.6829609 0.6646632
## [15] 0.6432819 0.6088202 0.5688872 0.5293417 0.5001353 0.4761589 0.4592724
## [22] 0.4426271 0.4148783 0.3789333 0.3458221 0.3182895
## 
## $freq
##  [1] 0.005347594 0.010695187 0.016042781 0.021390374 0.026737968
##  [6] 0.032085561 0.037433155 0.042780749 0.048128342 0.053475936
## [11] 0.058823529 0.064171123 0.069518717 0.074866310 0.080213904
## [16] 0.085561497 0.090909091 0.096256684 0.101604278 0.106951872
## [21] 0.112299465 0.117647059 0.122994652 0.128342246 0.133689840
## [26] 0.139037433 0.144385027 0.149732620 0.155080214 0.160427807
## [31] 0.165775401 0.171122995 0.176470588 0.181818182 0.187165775
## [36] 0.192513369 0.197860963 0.203208556 0.208556150 0.213903743
## [41] 0.219251337 0.224598930 0.229946524 0.235294118 0.240641711
## [46] 0.245989305 0.251336898 0.256684492 0.262032086 0.267379679
## [51] 0.272727273 0.278074866 0.283422460 0.288770053 0.294117647
## [56] 0.299465241 0.304812834 0.310160428 0.315508021 0.320855615
## [61] 0.326203209 0.331550802 0.336898396 0.342245989 0.347593583
## [66] 0.352941176 0.358288770 0.363636364 0.368983957 0.374331551
## [71] 0.379679144 0.385026738 0.390374332 0.395721925 0.401069519
## [76] 0.406417112 0.411764706 0.417112299 0.422459893 0.427807487
## [81] 0.433155080 0.438502674 0.443850267 0.449197861 0.454545455
## [86] 0.459893048 0.465240642 0.470588235 0.475935829 0.481283422
## [91] 0.486631016 0.491978610 0.497326203
## 
## $db
##  [1]  14.3997611  15.8955909   7.7642205   8.7050186   3.8198773
##  [6]  -1.1863576   2.6815084  -0.4667544  -1.0582274  -3.9729372
## [11]  -4.4388886  -5.3739230  -7.0574666  -3.9001118  -5.3732547
## [16]  -5.9203161  -6.0265835  -6.3073466  -8.7427682  -8.5315257
## [21]  -8.1694048  -8.8726978  -6.2228751  -5.6255282  -5.7869081
## [26]  -2.3190255  -0.6895940 -18.2510780 -12.7252753 -17.9895004
## [31] -13.1495229 -17.1809062 -12.9879824 -18.5885318 -14.0950716
## [36] -19.2823521 -16.1802599 -22.2193540 -17.1778910 -15.9864725
## [41] -13.8720977 -14.3606703 -12.3482825 -19.0159734 -12.8299009
## [46] -16.1634117 -19.2402721 -16.9940391 -19.9193164 -24.4462952
## [51] -12.2184745 -14.1347405 -11.1407485 -19.5608640 -22.3406178
## [56] -26.2948840 -15.9219942 -19.7896219 -16.3429699 -16.6545137
## [61] -13.2991307 -20.9465604 -12.5392646 -23.3015328 -21.5282804
## [66] -17.4748203 -26.6390880 -24.5923917 -17.5647166 -19.7282494
## [71] -17.6845420 -23.0825883 -19.0743556 -19.4958696 -19.5138333
## [76] -20.4270518 -18.7343253 -16.8982195 -14.1306252 -14.2054286
## [81] -15.0466299 -13.4474119 -14.7662082 -15.2848488 -14.7767269
## [86] -17.2543647 -17.5524024 -16.7680943 -23.6931393 -19.6888637
## [91] -22.9328710 -17.1805763 -15.7422487
## 
## $dbz
##  [1]  12.1744244  11.8068871  11.1912920  10.3234897   9.1987290
##  [6]   7.8134517   6.1690554   4.2794940   2.1855400  -0.0227779
## [11]  -2.1856123  -4.0930116  -5.5814407  -6.6274820  -7.3078903
## [16]  -7.6988921  -7.8526856  -7.8271468  -7.6919809  -7.5061868
## [21]  -7.3030888  -7.0977786  -6.9055053  -6.7545171  -6.6859448
## [26]  -6.7448940  -6.9710578  -7.3933729  -8.0283548  -8.8797888
## [31]  -9.9376266 -11.1745795 -12.5395304 -13.9486256 -15.2804168
## [36] -16.3913300 -17.1665125 -17.5812779 -17.7097183 -17.6695854
## [41] -17.5637030 -17.4562894 -17.3753921 -17.3239844 -17.2913960
## [46] -17.2628022 -17.2262162 -17.1767997 -17.1186074 -17.0640034
## [51] -17.0309217 -17.0382729 -17.1002705 -17.2209453 -17.3905128
## [56] -17.5857017 -17.7761429 -17.9367144 -18.0606665 -18.1649087
## [61] -18.2832201 -18.4521578 -18.6979308 -19.0286543 -19.4319317
## [66] -19.8765383 -20.3176913 -20.7053075 -20.9929698 -21.1442791
## [71] -21.1356798 -20.9585908 -20.6228760 -20.1586258 -19.6118560
## [76] -19.0349178 -18.4769322 -17.9782467 -17.5690679 -17.2703544
## [81] -17.0951532 -17.0493949 -17.1318377 -17.3332302 -17.6350503
## [86] -18.0085946 -18.4157858 -18.8133902 -19.1613062 -19.4324915
## [91] -19.6189508 -19.7292633 -19.7788462
  1. Differencing Data: much more stationary data and have surfaced a seasonal component = 7 seen on ACF peaks at 7, 14, 21
#b
inpos.diff = artrans.wge(us$positiveIncrease, phi.tr = 1, lag.max = 100)

  1. Seasonality Transformation: stationary data, and an ACF that reflects data other than white noise to be modeled.
#c
inpos.diff.seas = artrans.wge(inpos.diff,phi.tr = c(0,0,0,0,0,0,1), lag.max = 100)

  1. Model IDing: diagnose with aic.wge to determine phi, thetas: Both AIC/BIC select ARMA(4,2)
#d
aic5.wge(inpos.diff.seas)
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  aic
##       p    q        aic
## 15    4    2   15.62368
## 9     2    2   15.74701
## 18    5    2   15.74771
## 12    3    2   15.75351
## 7     2    0   15.75551
aic5.wge(inpos.diff.seas, type = "bic")
## ---------WORKING... PLEASE WAIT... 
## 
## 
## Five Smallest Values of  bic
##       p    q        bic
## 15    4    2   15.74832
## 2     0    1   15.79201
## 7     2    0   15.80893
## 4     1    0   15.81457
## 3     0    2   15.81522
  1. White Noise Test
#e
acf(inpos.diff.seas)

ljung.wge(inpos.diff.seas)$pval
## Obs -0.3711871 -0.02127283 0.07854862 0.05194601 -0.1044838 0.3276797 -0.4658847 0.1565306 0.1284186 -0.1060791 -0.07693327 0.1030723 -0.06391648 0.05002847 0.004471288 -0.03640442 -0.01551996 0.04133725 -0.01110058 -0.08153895 0.06570572 -0.03709739 -0.02390062 0.04588417
## [1] 1.246447e-12
  1. Estimate Phis, Thetas
  2. Model Building
#f
est.inpos.seas = est.arma.wge(inpos.diff.seas, p = 4, q = 2)
## 
## Coefficients of Original polynomial:  
## -1.8727 -1.4974 -0.4735 0.0283 
## 
## Factor                 Roots                Abs Recip    System Freq 
## 1+1.1293B+0.6983B^2   -0.8086+-0.8822i      0.8357       0.3681
## 1+0.7944B             -1.2588               0.7944       0.5000
## 1-0.0510B              19.6244               0.0510       0.0000
##   
## 
mean(us$positiveIncrease)
## [1] 22567.12
#g
  1. Forecast
  • 7 Day
#7 day
inpos.preds7 = fore.aruma.wge(us$positiveIncrease, phi = est.inpos.seas$phi, theta = est.inpos.seas$theta, d=1, s=7, n.ahead = 7)

  • 90 Day
#90 day
inpos.preds90 = fore.aruma.wge(us$positiveIncrease, phi = est.inpos.seas$phi, theta = est.inpos.seas$theta, d=1, s=7, n.ahead = 90)

  1. Plotting forecasts
  • 7 Day Forecast
#7 day forecast
plot(seq(1,187,1), us$positiveIncrease, type = "l", xlim = c(0,195), ylim = c(0,80000), ylab = "Increase in COVID Cases", main = "7 Day Increase in COVID Cases Forecast")
lines(seq(188, 194,1), inpos.preds7$f, type = "l", col = "red")

  • 90 Day Forecast
#90 day forecast
plot(seq(1,187,1), us$positiveIncrease, type = "l", xlim = c(0,280), ylim = c(0,80000), ylab = "Increase in COVID Cases", main = "90 Day Increase in COVID Cases Forecast")
lines(seq(188, 277,1), inpos.preds90$f, type = "l", col = "red")

Forecast Dependent Variable: MLR w/ Correlated Errors for Currently Hospitalized Patients using Positive Increase

  1. Data prep and recognizing differencing and seasonal components of Currently Hospitalized
#Selecting only those dates with reported current hospitilizations
us <- dplyr::slice(us,56:187)
invisible(us)
#Stationarize Currently Hospitalized Variable
us.diff = artrans.wge(us$hospitalizedCurrently, phi.tr = 1, lag.max = 100)

acf(us.diff)
us.diff.seas = artrans.wge(us.diff,phi.tr = c(0,0,0,0,0,0,1), lag.max = 100)

acf(us.diff.seas)
#Stationarize Increase Positive Variable
inpos.diff = artrans.wge(us$positiveIncrease, phi.tr = 1, lag.max = 100)

acf(inpos.diff)
inpos.diff.seas = artrans.wge(inpos.diff,phi.tr = c(0,0,0,0,0,0,1), lag.max = 100)

acf(inpos.diff.seas)

  1. Fit simple regression model predicting hospitalizedCurrently with positiveIncrease.
mlr.fit = lm(us.diff.seas~inpos.diff.seas, data=cbind(data.frame(us.diff.seas), data.frame(inpos.diff.seas)))
summary(mlr.fit)
## 
## Call:
## lm(formula = us.diff.seas ~ inpos.diff.seas, data = cbind(data.frame(us.diff.seas), 
##     data.frame(inpos.diff.seas)))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7149.3  -554.2    73.4   611.2  6506.7 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)     -13.32614  143.23299  -0.093  0.92603   
## inpos.diff.seas   0.11733    0.04227   2.776  0.00637 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1595 on 122 degrees of freedom
## Multiple R-squared:  0.0594, Adjusted R-squared:  0.0517 
## F-statistic: 7.705 on 1 and 122 DF,  p-value: 0.006375
AIC(mlr.fit)
## [1] 2184.712
acf(mlr.fit$residuals)

plot(mlr.fit$residuals)

  1. Diagnose with aic.wge and see what the p, q would be with residuals from above model
  • Below we can see that our aic.wge function has selected an ARMA(1,2) model for modeling our cmort information.
mlr.phis= aic.wge(mlr.fit$residuals)
mlr.phis
## $type
## [1] "aic"
## 
## $value
## [1] 14.53403
## 
## $p
## [1] 5
## 
## $q
## [1] 2
## 
## $phi
## [1] -0.2465069 -0.4219100  0.3119255  0.2116968  0.2024803
## 
## $theta
## [1] -0.4657537 -0.9566014
## 
## $vara
## [1] 1803071
  1. Now forecast with ARIMA function with phi’s from above coefficients and ARMA(1,2) model,
mlr.fit.arima = arima(us$hospitalizedCurrently, order = c(5,1,2), seasonal = list(order = c(1,0,0), period = 7), xreg = us$positiveIncrease)
## Warning in log(s2): NaNs produced
#AIC(mlr.fit.arima)
  1. Run test to see if data is white noise: confirmed white noise, continue forward
acf(mlr.fit.arima$resid) 

ltest1 = ljung.wge(mlr.fit.arima$resid) 
## Obs -0.006649913 -0.009334226 -0.01070918 0.01827832 -0.02771128 -0.01061308 -0.01123975 0.08781402 -0.05809279 0.07500195 0.05661161 -0.08527475 -0.007553036 0.1098068 0.002427647 -0.010962 -0.04242877 -0.001526285 -0.05612271 -0.03313627 0.02930685 -0.007307643 -0.06056842 -0.03342384
ltest1$pval
## [1] 0.999208
ltest2 = ljung.wge(mlr.fit.arima$resid, K= 48)
## Obs -0.006649913 -0.009334226 -0.01070918 0.01827832 -0.02771128 -0.01061308 -0.01123975 0.08781402 -0.05809279 0.07500195 0.05661161 -0.08527475 -0.007553036 0.1098068 0.002427647 -0.010962 -0.04242877 -0.001526285 -0.05612271 -0.03313627 0.02930685 -0.007307643 -0.06056842 -0.03342384 0.08407797 -0.006875273 -0.002312484 0.06777622 -0.09591675 -0.06198868 -0.01331578 -0.03441461 -0.04091228 0.0270524 0.06615092 0.06538511 -0.09261011 -0.04939741 -0.03187124 -0.04934637 -0.01834029 -0.02031817 -0.001873916 0.02337696 -0.004828854 0.006148926 -0.002455308 0.004810674
ltest2$pval
## [1] 0.9999866
  1. Load the forecasted increase positive in a data frame
  • 7 Day
#7 Day Case Increase
regs7 = data.frame(positiveIncrease = inpos.preds7$f)
invisible(regs7)
  • 90 Day
#90 Day Case Increase
regs90 = data.frame(positiveIncrease = inpos.preds90$f)
invisible(regs90)
  1. Predictions
  • 7 Day
mlr1.preds7 = predict(mlr.fit.arima, newxreg = regs7, n.ahead = 7)
invisible(mlr1.preds7)
  • 90 Day
mlr1.preds90 = predict(mlr.fit.arima, newxreg = regs90, n.ahead =90)
invisible(mlr1.preds90)
  1. Plotted Forecasts
  • 7 Day
plot(seq(1,132,1), us$hospitalizedCurrently, type = "l",  xlim = c(0,140), ylim = c(0,60000), ylab = "Currently Hospitalized COVID Cases", main = "7 Day Forecast for COVID Hospitalized Cases")
lines(seq(133,139,1), mlr1.preds7$pred, type = "l", col = "red")

  • 90 Day
plot(seq(1,132,1), us$hospitalizedCurrently, type = "l",  xlim = c(0,223), ylim = c(0,60000), ylab = "Currently Hospitalized COVID Cases", main = "90 Day Forecast for COVID Hospitalized Cases")
lines(seq(133,222,1), mlr1.preds90$pred, type = "l", col = "red")

  1. ASE
  • 7-Day: 631,469.8
mlr.train7 <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[1:125], positiveIncrease = us$positiveIncrease[1:125])
mlr.test7 <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[126:132], positiveIncrease = us$positiveIncrease[126:132])

fit7 = arima(mlr.train7$hospitalizedCurrently, order = c(5,1,2), seasonal = list(order = c(1,0,0), period = 7), xreg = mlr.train7$positiveIncrease)
fit7
## 
## Call:
## arima(x = mlr.train7$hospitalizedCurrently, order = c(5, 1, 2), seasonal = list(order = c(1, 
##     0, 0), period = 7), xreg = mlr.train7$positiveIncrease)
## 
## Coefficients:
##          ar1      ar2      ar3     ar4     ar5      ma1     ma2    sar1
##       1.5532  -0.9556  -0.0329  0.1479  0.1442  -1.3595  0.9998  0.1704
## s.e.  0.0933   0.1667   0.1881  0.1699  0.0981   0.0402  0.0478  0.1017
##       mlr.train7$positiveIncrease
##                            0.0977
## s.e.                       0.0307
## 
## sigma^2 estimated as 1188841:  log likelihood = -1045.66,  aic = 2111.32
next7 = data.frame(positiveIncrease = mlr.test7$positiveIncrease)
next7
##   positiveIncrease
## 1            56971
## 2            63642
## 3            69150
## 4            71027
## 5            75193
## 6            65413
## 7            61713
mlr.7.preds = predict(fit7, newxreg = next7, n.ahead = 7)
mlr.7.preds
## $pred
## Time Series:
## Start = 126 
## End = 132 
## Frequency = 1 
## [1] 57339.81 58363.17 59186.53 59836.05 60501.54 59898.44 59601.92
## 
## $se
## Time Series:
## Start = 126 
## End = 132 
## Frequency = 1 
## [1] 1097.598 1716.760 2415.802 3162.212 3946.967 4790.285 5669.052
ASEmlr7 = mean((mlr.test7$hospitalizedCurrently - mlr.7.preds$pred)^2)
ASEmlr7
## [1] 631469.8
  • 90-Day: 567,209,810
  • It would not be a statistically sound decision to run an ASE for a 90 day forecast. The data set would be trained on 20% of the data and tested on 80% of the data. This is the exact opposite of modeling a statistically sound time series for prediction. For our COVID predictions of severity we will advise to only to produce forecasts and ASE with 7-day forecasts but have produced the ASE for reference below.
mlr.train90 <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[1:42], positiveIncrease = us$positiveIncrease[1:42])
mlr.test90 <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[43:132], positiveIncrease = us$positiveIncrease[43:132])

fit90 = arima(mlr.train90$hospitalizedCurrently, order = c(5,1,2), seasonal = list(order = c(1,0,0), period = 7), xreg = mlr.train90$positiveIncrease)
fit90
## 
## Call:
## arima(x = mlr.train90$hospitalizedCurrently, order = c(5, 1, 2), seasonal = list(order = c(1, 
##     0, 0), period = 7), xreg = mlr.train90$positiveIncrease)
## 
## Coefficients:
##          ar1     ar2     ar3     ar4      ar5     ma1     ma2    sar1
##       -0.739  0.4956  0.8542  0.0232  -0.3749  1.0459  0.4434  0.6831
## s.e.   0.331  0.4000  0.2116  0.2977   0.2003  0.3550  0.4646  0.1296
##       mlr.train90$positiveIncrease
##                              0.121
## s.e.                         0.060
## 
## sigma^2 estimated as 1546410:  log likelihood = -352.66,  aic = 725.32
next90 = data.frame(positiveIncrease = mlr.test90$positiveIncrease)
invisible(next90)

mlr.90.preds = predict(fit90, newxreg = next90, n.ahead = 90)
invisible(mlr.90.preds)

ASEmlr90 = mean((us$hospitalizedCurrently[43:132]-mlr.90.preds$pred)^2)
ASEmlr90
## [1] 567209810

Forecast Dependent Variable: MLR w/ Correlated Errors for Currently Hospitalized Patients w/ Positive Increase & Trend

  1. Fit simple regression model predicting hospitalizedCurrently with positiveIncrease and trend
#creating trend
time <- seq(1,124,1)
#fitting model
mlr.fit.t = lm(us.diff.seas~inpos.diff.seas+time, data=cbind(data.frame(us.diff.seas), data.frame(inpos.diff.seas)))
summary(mlr.fit.t)
## 
## Call:
## lm(formula = us.diff.seas ~ inpos.diff.seas + time, data = cbind(data.frame(us.diff.seas), 
##     data.frame(inpos.diff.seas)))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7111.0  -524.5    73.9   617.7  6540.6 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      32.35112  289.27171   0.112  0.91114   
## inpos.diff.seas   0.11724    0.04244   2.762  0.00663 **
## time             -0.73096    4.01658  -0.182  0.85590   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1601 on 121 degrees of freedom
## Multiple R-squared:  0.05966,    Adjusted R-squared:  0.04412 
## F-statistic: 3.839 on 2 and 121 DF,  p-value: 0.02419
AIC(mlr.fit.t)
## [1] 2186.678
acf(mlr.fit.t$residuals)

plot(mlr.fit.t$residuals)

  1. Diagnose with aic.wge and see what the p, q would be with residuals from above model
  • Below we can see that our aic.wge function has selected an ARMA(5,2) model for modeling our currently hospitalized information.
mlr.phis= aic.wge(mlr.fit.t$residuals)
mlr.phis
## $type
## [1] "aic"
## 
## $value
## [1] 14.53329
## 
## $p
## [1] 5
## 
## $q
## [1] 2
## 
## $phi
## [1] -0.2466126 -0.4220817  0.3118538  0.2119376  0.2027949
## 
## $theta
## [1] -0.4655288 -0.9564592
## 
## $vara
## [1] 1801724
  1. Now forecast with arima function with phi’s from above coefficients and ARMA(1,2) model.
Time <- seq(1,132,1)
mlr.fit.arima.t = arima(us$hospitalizedCurrently, order = c(5,1,2), seasonal = list(order = c(1,0,0), period = 7), xreg = cbind(us$positiveIncrease, Time))
#AIC(mlr.fit.arima.t)
  1. Run test to see if data is white noise; confirmed white noise continue forward.
acf(mlr.fit.arima.t$resid) 

ltest1 = ljung.wge(mlr.fit.arima.t$resid) 
## Obs 0.002267493 0.004743329 0.004085712 0.008584111 0.003613306 0.03592544 -0.02552583 0.08439651 -0.1060931 0.01276058 0.02640347 -0.06061388 0.05268146 0.1458814 0.008052735 -0.04255608 -0.0935536 -0.04208266 -0.0498013 -0.001717023 0.05934226 0.01341049 -0.07137289 -0.07279666
ltest1$pval
## [1] 0.9823161
ltest2 = ljung.wge(mlr.fit.arima.t$resid, K= 48)
## Obs 0.002267493 0.004743329 0.004085712 0.008584111 0.003613306 0.03592544 -0.02552583 0.08439651 -0.1060931 0.01276058 0.02640347 -0.06061388 0.05268146 0.1458814 0.008052735 -0.04255608 -0.0935536 -0.04208266 -0.0498013 -0.001717023 0.05934226 0.01341049 -0.07137289 -0.07279666 0.05320601 -0.01313386 0.01794322 0.1027134 -0.07595104 -0.06199621 -0.03306343 -0.06206906 -0.03944973 0.03988941 0.07572305 0.07297186 -0.1029145 -0.06857141 -0.04403521 -0.04289784 0.005428483 -0.003871671 -7.833593e-05 0.005208028 -0.03085214 -0.01036108 0.00645461 0.02868683
ltest2$pval
## [1] 0.9990337
  1. Load the forecasted increase positive in a data frame
  • 7 Day
#7 Day Case Increase
regs7t = data.frame(cbind(positiveIncrease = inpos.preds7$f, Time = seq(133,139)))
invisible(regs7)
  • 90 Day
#90 Day Case Increase
regs90t = data.frame(cbind(positiveIncrease = inpos.preds90$f, Time = seq(133,222)))
invisible(regs90)
  1. Predictions
  • 7 Day
mlr1.preds7.t = predict(mlr.fit.arima.t, newxreg = regs7t, n.ahead = 7)
invisible(mlr1.preds7.t)
  • 90 Day
mlr1.preds90.t = predict(mlr.fit.arima.t, newxreg = regs90t, n.ahead =90)
invisible(mlr1.preds90.t)
  1. Plotted Forecasts
  • 7 Day
plot(seq(1,132,1), us$hospitalizedCurrently, type = "l",  xlim = c(0,140), ylim = c(0,60000), ylab = "Currently Hospitalized COVID Cases", main = "7 Day Forecast for COVID Hospitalized Cases")
lines(seq(133,139,1), mlr1.preds7.t$pred, type = "l", col = "red")

  • 90 Day
plot(seq(1,132,1), us$hospitalizedCurrently, type = "l",  xlim = c(0,223), ylim = c(0,85000), ylab = "Currently Hospitalized COVID Cases", main = "90 Day Forecast for COVID Hospitalized Cases")
lines(seq(133,222,1), mlr1.preds90.t$pred, type = "l", col = "red")

  1. ASE
  • 7-Day: 1,449,565
mlr.t.train7 <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[1:125], positiveIncrease = us$positiveIncrease[1:125])
mlr.t.test7 <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[126:132], positiveIncrease = us$positiveIncrease[126:132])

fit.t7 = arima(mlr.t.train7$hospitalizedCurrently, order = c(5,1,2), seasonal = list(order = c(1,0,0), period = 7), xreg = cbind(mlr.t.train7$positiveIncrease, Time[1:125]))
fit.t7
## 
## Call:
## arima(x = mlr.t.train7$hospitalizedCurrently, order = c(5, 1, 2), seasonal = list(order = c(1, 
##     0, 0), period = 7), xreg = cbind(mlr.t.train7$positiveIncrease, Time[1:125]))
## 
## Coefficients:
##          ar1      ar2      ar3     ar4     ar5      ma1     ma2    sar1
##       1.5478  -0.9545  -0.0374  0.1514  0.1369  -1.3599  1.0000  0.1687
## s.e.  0.0934   0.1663   0.1879  0.1699  0.0988   0.0401  0.0472  0.1016
##       cbind(mlr.t.train7$positiveIncrease, Time[1:125])1
##                                                   0.0979
## s.e.                                              0.0307
##       cbind(mlr.t.train7$positiveIncrease, Time[1:125])2
##                                                 405.4581
## s.e.                                            456.0831
## 
## sigma^2 estimated as 1182658:  log likelihood = -1045.3,  aic = 2112.59
next.t7 = data.frame(positiveIncrease = mlr.t.test7$positiveIncrease, Time = Time[126:132])
next.t7
##   positiveIncrease Time
## 1            56971  126
## 2            63642  127
## 3            69150  128
## 4            71027  129
## 5            75193  130
## 6            65413  131
## 7            61713  132
mlr.t.7.preds = predict(fit.t7, newxreg = next.t7, n.ahead = 7)
mlr.t.7.preds
## $pred
## Time Series:
## Start = 126 
## End = 132 
## Frequency = 1 
## [1] 57428.02 58546.62 59497.98 60294.34 61128.42 60713.12 60628.56
## 
## $se
## Time Series:
## Start = 126 
## End = 132 
## Frequency = 1 
## [1] 1094.813 1707.677 2395.330 3123.203 3885.663 4701.485 5547.871
ASEmlr7 = mean((mlr.t.test7$hospitalizedCurrently - mlr.t.7.preds$pred)^2)
ASEmlr7
## [1] 1449565
  • 90-Day: 2,867,623,021
  • It would not be a statistically sound decision to run an ASE for a 90 day forecast. The data set would be trained on 20% of the data and tested on 80% of the data. This is the exact opposite of modeling a statistically sound time series for prediction. For our COVID predictions of severity we will advise to only to produce forecasts and ASE with 7-day forecasts but have produced the ASE for reference below.
mlr.t.train90 <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[1:42], positiveIncrease = us$positiveIncrease[1:42])
mlr.t.test90 <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[43:132], positiveIncrease = us$positiveIncrease[43:132])

fit.t90 = arima(mlr.t.train90$hospitalizedCurrently, order = c(5,1,2), seasonal = list(order = c(1,0,0), period = 7), xreg = cbind(mlr.t.train90$positiveIncrease, Time[1:42]))
fit.t90
## 
## Call:
## arima(x = mlr.t.train90$hospitalizedCurrently, order = c(5, 1, 2), seasonal = list(order = c(1, 
##     0, 0), period = 7), xreg = cbind(mlr.t.train90$positiveIncrease, Time[1:42]))
## 
## Coefficients:
##           ar1     ar2     ar3      ar4      ar5     ma1     ma2    sar1
##       -0.7190  0.5102  0.8233  -0.0017  -0.3785  1.0177  0.4052  0.6661
## s.e.   0.3626  0.4235  0.2419   0.3069   0.1977  0.3973  0.5122  0.1392
##       cbind(mlr.t.train90$positiveIncrease, Time[1:42])1
##                                                   0.1229
## s.e.                                              0.0607
##       cbind(mlr.t.train90$positiveIncrease, Time[1:42])2
##                                                 841.1172
## s.e.                                           1396.5187
## 
## sigma^2 estimated as 1548026:  log likelihood = -352.5,  aic = 727
next.t90 = data.frame(positiveIncrease = mlr.t.test90$positiveIncrease, Time = Time[43:132])
invisible(next.t90)

mlr.t.90.preds = predict(fit.t90, newxreg = next.t90, n.ahead = 90)
invisible(mlr.t.90.preds)

ASEmlr90 = mean((mlr.t.test90$hospitalizedCurrently - mlr.t.90.preds$pred)^2)
ASEmlr90
## [1] 2867623021

Forecast Dependent Variable: MLR w/ Correlated Errors (lagged variables) for Currently Hospitalized Patients w/ Positive Increase & Trend

With a quick check, we can see that there is no lag correlation between the increase of COVID patients and hospitalized patients, like we theorized there might be. So we will not model an MLR w/ correlated errors on lagged variables.

ccf(us$positiveIncrease,us$hospitalizedCurrently)

Multivariate VAR Model

Since we are working with a seasonal and transformation component, but it requires an additional transformation for the total positive COVID cases to become stationary for evaluation, we will only use positive increase of cases to predict currently hospitalized cases for the VAR model.

  1. Differenced and Seasonal Transformation VAR Model
#Positive Cases Transformations
#pos.diff = artrans.wge(us$positive, phi.tr = 1, lag.max = 100)
#pos.diff.2 = artrans.wge(pos.diff, phi.tr = 1, lag.max = 100)
#pos.trans = artrans.wge(pos.diff.2,phi.tr = c(0,0,0,0,0,0,1), lag.max = 100)
#Increase Positive Transformation
inpos.diff = artrans.wge(us$positiveIncrease, phi.tr = 1, lag.max = 100)

inpos.trans = artrans.wge(inpos.diff,phi.tr = c(0,0,0,0,0,0,1), lag.max = 100)

#Current Hospitalization Transformations
us.diff = artrans.wge(us$hospitalizedCurrently, phi.tr = 1, lag.max = 100)

currhosp.trans = artrans.wge(us.diff,phi.tr = c(0,0,0,0,0,0,1), lag.max = 100)

  1. Use VAR select to find best model and fit model: p = 8 for lowest AIC
#VARselect to choose lag
VARselect(cbind(currhosp.trans, inpos.trans), lag.max = 10, type= "both")
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      7      7      2      7 
## 
## $criteria
##                   1            2            3            4            5
## AIC(n) 3.093857e+01 3.083329e+01 3.079091e+01 3.076976e+01 3.069554e+01
## HQ(n)  3.101650e+01 3.095018e+01 3.094676e+01 3.096458e+01 3.092932e+01
## SC(n)  3.113058e+01 3.112131e+01 3.117493e+01 3.124979e+01 3.127158e+01
## FPE(n) 2.731960e+13 2.459304e+13 2.357880e+13 2.309552e+13 2.145770e+13
##                   6            7            8            9           10
## AIC(n) 3.066656e+01 3.051427e+01 3.053424e+01 3.059674e+01 3.059649e+01
## HQ(n)  3.093930e+01 3.082598e+01 3.088492e+01 3.098638e+01 3.102509e+01
## SC(n)  3.133860e+01 3.128233e+01 3.139830e+01 3.155681e+01 3.165257e+01
## FPE(n) 2.086403e+13 1.793906e+13 1.833019e+13 1.955149e+13 1.959499e+13
#fit model
usdiff.var.fit = VAR(cbind(currhosp.trans, inpos.trans), type = "both", p = 2)
#AIC: 30.51427
  1. Predictions for Difference
  • 7 Day
#7 day predictions
pred.var7 = predict(usdiff.var.fit, n.ahead = 7)
pred.var7$fcst$currhosp.trans[,1:3]
##            fcst     lower    upper
## [1,]  -42.12350 -2942.410 2858.163
## [2,] -385.17371 -3346.493 2576.145
## [3,]  -80.65766 -3262.529 3101.214
## [4,] -124.14161 -3327.324 3079.041
## [5,]  -46.62794 -3284.538 3191.282
## [6,]  -43.58380 -3288.207 3201.039
## [7,]  -11.23201 -3262.277 3239.813
  • 90 Day
#90 day predictions
pred.var90 = predict(usdiff.var.fit, n.ahead = 90)
invisible(pred.var90$fcst$currhosp.trans)
  1. Calculate Actual Forecasts from Predicted Differences
  • 7 Day
startingPoints7 = us$hospitalizedCurrently[126:132]
currHospForecasts7 = pred.var7$fcst$currhosp.trans[,1:3] + startingPoints7
  • 90 Day
startingPoints90 = us$hospitalizedCurrently[43:132]
currHospForecasts90 = pred.var90$fcst$currhosp.trans[,1:3] + startingPoints90
  1. Plotting Forecasts
  • 7 Day
#7 day Forecasts
plot(seq(1,132,1), us$hospitalizedCurrently, type = "l", xlim = c(0,139), ylim = c(0,62000), ylab = "Currently Hospitalized COVID Patients", main = "7 Day Currently Hospitalized Patients Forecast")
lines(seq(133,139,1), as.data.frame(currHospForecasts7)$fcst, type = "l", col = "red")
lines(seq(133,139,1), as.data.frame(currHospForecasts7)$upper, type = "l", col = "blue")
lines(seq(133,139,1), as.data.frame(currHospForecasts7)$lower, type = "l", col = "blue")

- 90 Day

#90 day Forecasts
plot(seq(1,132,1), us$hospitalizedCurrently, type = "l", xlim = c(0,222), ylim = c(0,62000), ylab = "Currently Hospitalized COVID Patients", main = "90 Day Currently Hospitalized Patients Forecast")
lines(seq(133,222,1), as.data.frame(currHospForecasts90)$fcst, type = "l", col = "red")
lines(seq(133,222,1), as.data.frame(currHospForecasts90)$upper, type = "l", col = "blue")
lines(seq(133,222,1), as.data.frame(currHospForecasts90)$lower, type = "l", col = "blue")

  1. ASE
  • 7-Day
varASE7 = mean((us$hospitalizedCurrently[126:132]-currHospForecasts7[1:7])^2)
varASE7
## [1] 25178.55
  • 90-Day
  • It would not be a statistically sound decision to run an ASE for a 90 day forecast. The data set would be trained on 20% of the data and tested on 80% of the data. This is the exact opposite of modeling a statistically sound time series for prediction. For our COVID predictions of severity we will advise to only to produce forecasts and ASE with 7-day forecasts but have produced the ASE for reference below.
varASE7 = mean((us$hospitalizedCurrently[43:132]-currHospForecasts90[1:90])^2)
varASE7
## [1] 10791.59
Multilayer Perceptron (MLP) / Neural Network Model

Since we have been looking at additional regressors for all our above models and know that the best regressor to leverage will be positive increase of COVID cases. This regressor reflects similar behavior to that of current hospitalizations and can be model properly against hospitalizations. 1. Create Current Hospitalized ts variable

tUScurrhop = ts(us$hospitalizedCurrently)
  1. Create data frame of regressor: positive increase COVID cases variable
tUSx = data.frame(positiveIncrease = ts(us$positiveIncrease))
  1. Forecast of positive increase of COVID cases
  • 7 Day Forecast for Regressor
fit.mlp.new = mlp(ts(tUSx), reps = 50, comb = "mean")
mlp.new.fore7 = forecast(fit.mlp.new, h = 7)
invisible(mlp.new.fore7)
  • 90 Day Forecast for Regressor
mlp.new.fore90 = forecast(fit.mlp.new, h = 90)
invisible(mlp.new.fore90)
  1. Combine observed new cases + forecast new cases
  • 7 Day regressor var
new.regressor7 <- data.frame(c(us$positiveIncrease, mlp.new.fore7$mean))
invisible(new.regressor7)
  • 90 day regressor var
new.regressor90 <- data.frame(c(us$positiveIncrease, mlp.new.fore90$mean))
invisible(new.regressor90)
  1. Fit model for currently hospitalized w/ regressor of new cases
mlp.fit1 = mlp(tUScurrhop, xreg = tUSx, outplot = T, comb = "mean")

plot(mlp.fit1)

  1. Forecast w/ known Positive Increase Case Variable
  • Currently Hospitalized 7 Day Forecast w/ Positive Increase Regressor
currhosp.fore7 = forecast(mlp.fit1, h = 7, xreg = new.regressor7)
plot(currhosp.fore7)

- Currently Hospitalized 90 Day Forecast w/ Positive Increase Regressor

currhosp.fore90 = forecast(mlp.fit1, h = 90, xreg = new.regressor90)
plot(currhosp.fore90)

  1. ASE
  • 7-Day
#ASEmlr7 = mean((us$hospitalizedCurrently[126:132]-currhosp.fore7$mean)^2)
#ASEmlr7

tUScurrhop2 = ts(us$hospitalizedCurrently[1:125])
tUSx2 = data.frame(positiveIncrease = ts(us$positiveIncrease[1:125]))
fit.mlp.new2 = mlp(ts(tUSx2), reps = 50, comb = "mean")
mlp.new.fore7.2 = forecast(fit.mlp.new2, h = 7)
invisible(mlp.new.fore7.2)
new.regressor7.2 <- data.frame(c(us$positiveIncrease[1:125], mlp.new.fore7.2$mean))
invisible(new.regressor7.2)
mlp.fit1.2 = mlp(tUScurrhop2, xreg = tUSx2, comb = "mean")
#plot(mlp.fit1.2)
currhosp.fore7.2 = forecast(mlp.fit1.2, h = 7, xreg = new.regressor7.2)
#plot(currhosp.fore7.2)
ASEmlr7.2 = mean((us$hospitalizedCurrently[126:132]-currhosp.fore7.2$mean)^2)
ASEmlr7.2
## [1] 171151.4
  • 90-Day
  • It would not be a statistically sound decision to run an ASE for a 90 day forecast. The data set would be trained on 20% of the data and tested on 80% of the data. This is the exact opposite of modeling a statistically sound time series for prediction. For our COVID predictions of severity we will advise to only to produce forecasts and ASE with 7-day forecasts but have produced the ASE for reference below.
#ASEmlr90 = mean((us$hospitalizedCurrently[43:132]-currhosp.fore90$mean)^2)
#ASEmlr90

tUScurrhop3 = ts(us$hospitalizedCurrently[1:42])
tUSx3 = data.frame(positiveIncrease = ts(us$positiveIncrease[1:42]))
fit.mlp.new3 = mlp(ts(tUSx3), reps = 50, comb = "mean")
mlp.new.fore7.3 = forecast(fit.mlp.new3, h = 90)
invisible(mlp.new.fore7.3)
new.regressor7.3 <- data.frame(c(us$positiveIncrease[1:42], mlp.new.fore7.3$mean))
invisible(new.regressor7.3)
mlp.fit1.3 = mlp(tUScurrhop3, xreg = tUSx2, comb = "mean")
#plot(mlp.fit1.3)
currhosp.fore7.3 = forecast(mlp.fit1.3, h = 90, xreg = new.regressor7.3)
#plot(currhosp.fore7.3)
ASEmlr7.3 = mean((us$hospitalizedCurrently[43:132]-currhosp.fore7.3$mean)^2)
ASEmlr7.3
## [1] 40431673008

MLP NN Model Analysis

We completed a default neural network model. With so many opportunities for how to actually tune neural network model we knew this would not be our best model in this case. So we moved forward with a hyper tuned neural network model that allows us to calculate many windowed ASEs and compare those model against each other.

Ensemble Model: Hyper Tuned Neural Network Model

We have leveraged the tswgewrapper code above to produce a hyperparameter tuned NN model for our ensemble model. This model will work as our higher functioning ensemble

#if (!require(devtools)) {
#  install.packages("devtools")
#}
#devtools::install_github("josephsdavid/tswgewrapped", build_vignettes = TRUE)
library(tswgewrapped)
  1. Train/ Test Data Sets
data_train.m <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[1:122], positiveIncrease = us$positiveIncrease[1:122])
data_test.m <- data.frame(hospitalizedCurrently = us$hospitalizedCurrently[123:132], positiveIncrease = us$positiveIncrease[123:132])
  1. Hyper tune parameters
# search for best NN hyperparameters in given grid
model.m = tswgewrapped::ModelBuildNNforCaret$new(data = data_train.m, var_interest = "hospitalizedCurrently",
                                               search = 'random', tuneLength = 5, parallel = TRUE,
                                               batch_size = 50, h = 7, m = 7,
                                               verbose = 1)
## Registered S3 method overwritten by 'lava':
##   method     from   
##   print.pcor greybox
## Aggregating results
## Selecting tuning parameters
## Fitting reps = 10, hd = 2, allow.det.season = TRUE on full training set
## - Total Time for training: : 50.664 sec elapsed
  1. The windowed ASEs associated with the grid of hyperparameters is shown in the table and heatmap below.
res.m <- model.m$summarize_hyperparam_results()
res.m
##   reps hd allow.det.season     RMSE      ASE   RMSESD    ASESD
## 1   10  2             TRUE 3103.429 13970241 2184.689 16296534
## 2   11  3            FALSE 3129.341 14942705 2380.109 19673017
## 3   16  1            FALSE 3559.638 16480775 2047.127 15988977
## 4   16  3             TRUE 3204.752 15391189 2373.358 19509361
## 5   22  3            FALSE 3268.191 16196849 2463.200 19979406
model.m$plot_hyperparam_results()

  1. Best Parameters shown in below table. The best hyperparameters based on this grid search are 10 repetitions and 2 hidden layers, and allow.det.season = TRUE .
best.m <- model.m$summarize_best_hyperparams()
best.m
##   reps hd allow.det.season
## 1   10  2             TRUE
  1. Windowed ASE of 13,970,241.
final.ase.m <- dplyr::filter(res.m, reps == best.m$reps &
                    hd == best.m$hd &
                    allow.det.season == best.m$allow.det.season)[['ASE']]
final.ase.m
## [1] 13970241
  1. Ensemble model characteristics and plot
# Final Model
caret_model.m = model.m$get_final_models(subset = 'a')
caret_model.m$finalModel
## MLP fit with 2 hidden nodes and 10 repetitions.
## Series modelled in differences: D1.
## Univariate lags: (1,2,3)
## 1 regressor included.
## - Regressor 1 lags: (1,3)
## Forecast combined using the median operator.
## MSE: 1065689.5459.
# Plot Final Model
plot(caret_model.m$finalModel)

  1. Forecasts
  • Model final with best hyper parameters from above and regressor of positive increase of COVID cases.
#Ensemble model
ensemble.mlp = mlp(tUScurrhop, xreg = tUSx, outplot = T, reps = 10, hd = 2, allow.det.season = T)

ensemble.mlp
## MLP fit with 2 hidden nodes and 10 repetitions.
## Series modelled in differences: D1.
## Univariate lags: (1,2,3)
## 1 regressor included.
## - Regressor 1 lags: (1,3)
## Forecast combined using the median operator.
## MSE: 1025155.9706.
#Plot ensemble model
plot(ensemble.mlp)

  • 7 Day
fore7.m = forecast(ensemble.mlp, xreg = new.regressor7, h=7)
plot(fore7.m)

  • 90 Day
fore90.m = forecast(ensemble.mlp, xreg = new.regressor90, h=90)
plot(fore90.m)

  1. ASE
  • 7-Day
ASEmlr7 = mean((us$hospitalizedCurrently[126:132]-fore7.m$mean)^2)
ASEmlr7
## [1] 2113141
  • 90-Day
  • It would not be a statistically sound decision to run an ASE for a 90-day forecast. The data set would be trained on 20% of the data and tested on 80% of the data. This is the exact opposite of modeling a statistically sound time series for prediction. For our COVID predictions of severity we will advise to only to produce forecasts and ASE with 7-day forecasts. This point is proven by the extremely high ASE calculated below.
ASEmlr90 = mean((us$hospitalizedCurrently[43:132]-fore90.m$mean)^2)
ASEmlr90
## [1] 2239616945

Multivariate Model Analysis

We started our multivariate analysis using multiple regression with correlated errors models. We ended up producing two models, one with and without a trend. We predicted that a trend (or time) would be a deciding variable in which of these models would be outperform the other. However, we expected trend to be the better model. When we compared the two via their AICs, we found the MLR model without trend performed better both with an AIC and when we compared the ASEs between the two model types. When applying our domain knowledge, we did come to the conclusion that time would not necessarily be a strong determinant for the severity of COVID since we have yet to observe, and subsequently able to analyze, a full cycle of the virus effect on the US population. Once we are able to analyze data to this level, we would expect a time correlation and therefore lag/trend model performing better in the predictions of the virus’s severity and therefore hospitalized patients.

Next we completed a Vector AR (VAR) model analysis of our data. This modeling processes requires the data of both the independent and dependent variables to be stationary. Which means we are actually modeling on the residuals of the data. This also results in the predictions/forecasts being based on differences of the forecasts and therefore the forecasts have to be calculated based on the previous periods values. This modeling techniques is highly sensitive to the previously observed values, which in the case of COVID is essential for understanding the severity of COVID. Since we have yet to observe a full cycle the modeling for predicting hospitalized cases should be closely based on what we’ve previously observed. So far this is the better of the three models we’ve produced.

For our final 2 models we performed multilayered perceptron or neural network model. For our first model we used a mostly default hyper parameters for the model. The ASEs returned with cross validation are much higher than the VAR model. We see it in the billions for the 90-day window forecast. This is as expected since the NN model creates multiple scenarios and takes the average of those in order to forecast moving forward. This means even the highest and lowest are incorporated into the forecast. While some of the scenarios are not statistically likely the NN model is attempting to create a forecast that takes these possibilities into account. In order to help created a better neural network we created a hyper parameter turned model. This model produced ASEs much lower than then default parameter neural network model. The forecasts for the hyper tuned NN model are slightly less dispersed than the default NN model telling us that these predictions are more helpful. Hyper tuning the additional parameters allows the model to be closer in predicting future hospitalized cases. We performed windowed ASEs in order to pick the best hyper tuned parameters for our advanced neural network model.

Upon reviewing our 90-day forecasts and ASEs we have concluded it would not be a statistically sound decision to run a 90-day forecast and allow it to make long term decisions until we have more data. Upon performing 90-day forecasts we end up having to train the model on 20% of the data and test it on 80% of the data. This is the exact opposite of modeling a statistically sound time series for prediction. For our COVID predictions of severity we will advise to only to produce forecasts and ASE with 7-day forecasts but have produced the ASE for reference below. This applies to all of our model’s cross validation efforts and ASE calculations.

We have chosen two models to help us predict severity with the COVID hospitalized. We will leverage the vector AR model for short term forecasts This model allows us to base our predictions off of previously observed values. Since COVID has not completed a full cycle and has shown heavy wandering behavior in the realization, we feel using this method will be the closest in order to get us prepared for forecasts for our 7-day and doing short term planning for the hospitalizes supplies and staffing with prediction intervals to help us make sure we account for possible peaks and valleys in our forecast.

For our long term forecasting we have chosen to go with our hyper tuned parameter neural network model. This model is helpful for long term forecasting because it has the ability to adjust and reapply the most effective model based on the newest data input with daily updates. Since the multilayered perceptron neural network models are not based on a fixed distribution assumption, this model will not come with prediction intervals. But rather, will continue to calculate all of the probably outcomes and produce a mean forecast for us to base all planning. The hyper parameter turned neural network model takes into account multiple possibilities and therefore does give us the most statistically useful forecast for our 90-day period.

It is essential both of these models be updated daily. COVID cases and hospitalizations changes frequently and the only way to continue to forecast and prepare as effectively as possible is to have updated models with as much historical data being taken into account as possible. As we move forward with presenting these models we’ve deicded on, it is important to remember these just appear to be the most useful models, and while we can work from them, none of them will be correct.

#VAR 7-Day Forecasts w/ Prediction Intervals
plot(seq(1,132,1), us$hospitalizedCurrently, type = "l", xlim = c(0,139), ylim = c(0,62000), ylab = "Currently Hospitalized COVID Patients", main = "7 Day Currently Hospitalized Patients Forecast")
lines(seq(133,139,1), as.data.frame(currHospForecasts7)$fcst, type = "l", col = "red")
lines(seq(133,139,1), as.data.frame(currHospForecasts7)$upper, type = "l", col = "blue")
lines(seq(133,139,1), as.data.frame(currHospForecasts7)$lower, type = "l", col = "blue")

#Hyper Tuned Parameter Neural Network Model Forecast w/ Prediction Intervals
fore90.m = forecast(ensemble.mlp, xreg = new.regressor90, h=90, )
plot(fore90.m)

Multivariate Model Performance Breakdown

Multiple Regression w/ Correlated Errors w/o Trend

  • AIC: 2184.712
  • 7-Day ASE
    • 631,469.8
  • 90-Day ASE
    • 567,209,810

Multiple Regression w/ Correlated Errors w/ Trend

  • AIC: 2186.678
  • 7-Day ASE
    • 1,449,565
  • 90-Day ASE
    • 2,867,623,021

Vector AR Model

  • 7-Day ASE
    • 25,178.55
  • 90-Day ASE
    • 10,791.59

Multilayered Perceptron / Neural Network Model

  • 7-Day ASE
    • 4,068,222
  • 90-Day ASE
    • 17,516,230,847

Ensemble Model: Hyper Tuned Neural Network Model

  • 7-Day ASE
    • 2,126,994
  • 90-Day ASE
    • 2,239,014,808
  • 7-Day Windowed ASE
    • 13,970,241